Blind quality assessment of multi-exposure fused images considering the detail, structure and color characteristics

In the process of multi-exposure image fusion (MEF), the appearance of various distortions will inevitably cause the deterioration of visual quality. It is essential to predict the visual quality of MEF images. In this work, a novel blind image quality assessment (IQA) method is proposed for MEF images considering the detail, structure, and color characteristics. Specifically, to better perceive the detail and structure distortion, based on the joint bilateral filtering, the MEF image is decomposed into two layers (i.e., the energy layer and the structure layer). Obviously, this is a symmetric process that the two decomposition results can independently and almost completely describe the information of MEF images. As the former layer contains rich intensity information and the latter captures some image structures, some energy-related and structure-related features are extracted from these two layers to perceive the detail and structure distortion phenomena. Besides, some color-related features are also obtained to present the color degradation which are combined with the above energy-related and structure-related features for quality regression. Experimental results on the public MEF image database demonstrate that the proposed method achieves higher performance than the state-of-the-art quality assessment ones.


Introduction
Multi-exposure fusion (MEF) images are obtained from a series of images with different exposure levels, which can offer richer information than any of the input images [1]. Compared with high dynamic range (HDR) images, the MEF image bypasses the process of tone mapping for displaying on common devices, and it can also obtain similar imaging effects to the HDR images. However, nothing is perfect, the existing MEF methods cannot achieve satisfactory effects for all MEF image sequences [2,3]. Always, the MEF image suffers from detail loss, structure destruction and color degradation which affect its visual quality. Therefore, it is necessary to develop some MEF image quality assessment (IQA) methods for comparing the performance of different MEF methods. According to the utilization rate of the reference information, the IQA methods can be categorized into three types [4], that are full-reference (FR) methods, reduced-reference (RR) methods, and no-reference (NR) methods. FR and RR methods need reference information for quality prediction, and NR methods do not require any reference information. However, due to the special way of MEF image acquisition, there are multiple reference images that cannot be directly compared with the fused images. Hence, NR methods are suitable for the MEF images.
In this paper, considering the detail, structure, and color distortion of MEF images, an NR-IQA method for MEF images is proposed based on joint bilateral filtering. Specifically, the MEF image is decomposed into the energy layer and structure layer by the joint bilateral filter, and then the two decomposed results are utilized to extract some quality-sensitive features for the distortion description. Meanwhile, color-related features are also extracted to perceive color distortion. As a whole, this paper presents the following contributions.
1. Inspired by the image fusion strategy, joint bilateral filtering is utilized for MEF image decomposition to obtain the energy layer and structure layer, which can better facilitate the extraction of distortion features.
2. Considering the detail and structure distortion of MEF images, energy-related and structure-related features are extracted from the two decomposition layers, respectively, which are relative to human visual perception.
3. In terms of color distortion introduced, some color-related features are also extracted. To obtain better quality prediction performance, feature selection is conducted based on random forest. Experimental results show that the proposed method is better than other competing methods.
The remainder of this paper is arranged as follows. Section 2 presents the related work about the IQA methods. The details of the proposed method are described in Section 3. Experimental results and analyses are given in Section 4. Section 5 draws the conclusion.

Related works
In this section, the existing multi-exposure image fusion methods and image quality assessment methods are presented here.

Multi-exposure image fusion methods
Since the MEF image can inherit the advantages of multiple images with different exposure degrees, the MEF method has received a lot of attention. Burt et al [5] utilized the Laplacian pyramid decomposition and the pyramid decomposition for binocular and MEF images, respectively. Goshtasby et al [6] partitioned the multiple images into uniform blocks and chose the maximum information of each block for fusion. Mertens et al [7] calculate the weighting maps to blend multiple exposures via color saturation and contrast. Raman et al [8] designed a fusion method based on the bilateral filter to obtain the MEF image. Li et al [9] introduced a new quadratic optimization-based method for fusion, the great contribution is that the fine details are extracted from a vector field. Gu et al [10] fused the multi-exposure images in the gradient field, since the human visual system (HVS) is sensitive to contrasts that can be represented by local gradients. To propose fast and effective methods, Li et al [11,12] carried out two pieces of research. The first one is a two-step work including the weight map calculation and the fused image construction. The second one decomposed the image into a base layer and a detail layer to make full use of spatial information. Ma et al [13] utilized a gradient ascent-based algorithm to implement the iterative optimization for MEF. Meanwhile, to obtain better results, Ma et al [14] designed a method based on deep learning.

Image quality assessment methods
As mentioned in the introduction, the IQA methods include FR, RR, and NR forms [15]. The NR ones are more practical in real applications. With regard to the NR quality assessment for ordinary images, many methods rely on natural scene statistics (NSS). For example, Moorthy et al. [16] utilized NSS to design the method named distortion identification-based image verity and integrity evaluation (DIIVINE). Saad et al. [17] processed the image in the discrete cosine transform domain and established an NSS model, which is named BLINDS-II. Mittal et al. [18] proposed BRISQUE method based on the features extracted from the NSS distribution. Liu et al. [19] extracted features from curvelet transform domain, named CurveletQA. Fang et al. [20] considered the natural statistical characteristics of images with contrast distorted must be different from those of normal images and designed a method called Contras-tQA. In image processing, the gradient is very important for images, which can represent the structure information of images. In view of this, to sense structure distortion, Xue et al. [21] extracted the features combining the Laplacian of Gaussian response and gradient magnitude map, named GradLog. Li et al. [22] describe the structure distortion by the gradient-weighted histogram of local binary pattern (GWH-GLBP). Liu et al. [23] considered the role of oriented gradient (OG) in quality prediction. Besides, Gu et al. [24] proposed the NIQMC method considering the contrast distortion of the image from the local and global perspectives. Jiang et al. [25] obtained a set of features, i.e., histogram, entropy, and structure, to achieve the goal of quality prediction, which also characterized the contrast. Oszust [26] presented a feature description form by image derivatives of different orders. Above all are designed for ordinary images, and there is still a lack of NR-IQA methods for MEF images. For the FR methods of MEF images, Ma et al. [27] measured structure consistency and luminance consistency based on structural similarity index measure (SSIM). Xing et al. [28] built a multi-scale contrast-based model inspired by the fact that the human visual system (HVS) is highly sensitive to contrast. Martinez et al. [29] implemented two stages including multi-scale computation and structural similarity score. Xu et al. [30] generated the local and global intermediate references from the input multiple images to extract features, reflecting the visual quality of fused images. In real application, the visual effect of MEF images is similar to tone-mapped images. For tone-mapped images, there exist some NR-IQA methods. For instance, Gu et al. [31] developed a tone-mapped quality index (BTMQI) taking the information, naturalness and structure into consideration. Inspired by the NSS model, Kundu et al. [32] measured the differential NSS and proposed the high dynamic range image gradient-based evaluator (HIGRADE). Yue et al. [33] simulated the color information process in the human brain to propose a novel quality assessment method, meanwhile, they further presented another method [34] in terms of color, naturalness, and structure. Besides, Jiang et al. [35] pointed out the viewing habit of humans and evaluated the quality of tone-mapped images from a local and global perspective. Wang et al. [36] also considered the local degradation characteristics and global statistical properties to present a method. Although there are a few NR-IQA methods for ordinary and tone-mapped images, it is still necessary to propose NR-IQA methods specifically for MEF images.

The proposed method
In this section, the proposed method is presented in detail, and the framework is shown in Fig  1. Specifically, the joint bilateral filter is utilized to decompose the MEF image, and the obtained energy and structure layers are used to portray the detail and structure distortion of the MEF image, respectively. In addition, color-related features are also extracted to measure the colorfulness degradation. Finally, the contributions of all features are measured through the random forest to maximize the quality prediction performance.

Image decomposition
It is well known that the averaging techniques including average filter and lowpass filter are widely used to decompose the image. However, these schemes may introduce detail loss more or less, which will damage the MEF image twice. To obtain the complementary information among the MEF image, a two-scale decomposition scheme is used based on the joint bilateral filter. As the intensity information and structure are important for the well-preservation MEF image, it is decomposed into two layers (i.e., the energy layer and structure layer). The process includes global blur and edge recovery.
For global blur, the important purpose is to disseminate the detailed information of the MEF image to the decomposed structural layer to the maximum extent. Given a MEF image I, it can be processed by Gaussian filtering to achieve the blur effect, which is denoted as F σ , defined as where G σ (x,y) is Gaussian filter, and the variance is σ 2 . Then, to generate the global blurred MEF image G, a weighted average Gaussian filter is utilized, which can be represented as where m and n denote the index of pixel coordinates, S(m) is a set of neighboring pixels of n, σ s is the standard deviation, and N m is the normalization operation.
Subsequently, the joint bilateral filter is introduced to recover the large structure of the MEF image which is destroyed by the above smooth operation, so as to obtain the energy layer, denoted as I E . The process is defined as where where g d and g r are the spatial distance function and intensity range function, respectively. They are used to set weights based on the distance among pixels and intensity differences, respectively. σ s and σ r control the spatial and distance weights of the bilateral filter [37], respectively.
For a MEF image I, let its structure layer be I S , which can be calculated as  2(A) shows the dim visual effect with the lowest MOS value, it loses some detailed information. This is also reflected in its corresponding energy and structure layers. The visual quality of Fig 2(B) is at a normal level, which preserves well details and structural information, and its MOS value is at a moderate level. For Fig 2(C), its detailed and structural information seems similar to Fig 2(B), but obviously, the color information is more abundant than Fig 2(B), which determines that it has the highest MOS value. The phenomenon of similar information between Fig 2(B) and 2 (C) is reflected in the corresponding energy layers. However, small detail and structure differences between them can be reflected in the corresponding structure layers. From the above observation, it can be drawn that the detail, structure, and color information are important for a well-quality MEF image. Therefore, the energy-related, structure-related and color-related features will be extracted to describe the distortion of the MEF image.

Energy-related features
The energy layer I E contains the details of the MEF image. Through the decomposition process, I E has a slightly ambiguous effect. Different energy layers will have different blurring representations. We calculated the contrast values from the edge blocks of I E .
First, I E is divided into k×k blocks, k is set as 64, which is suggested in [38]. To judge whether the block is the edge block or the flat block, the Sobel operator is used for edge detection by calculating the proportion of edge pixels in the whole block. If greater than 0.2%, the block is the edge one. Then, let F c be the feature of contrast, which can be defined as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where I E ' (i, j) is the i-th, j-th element of the edge block with the size of k×k. T is the number of edge blocks in I E , and � I E is the average intensity of all pixel values in the whole I E . In addition, let F e be the feature of energy to represent the amount of detailed information, which can be calculated as where I E (i, j) is the pixel value of I E in the position of (i, j). From the viewpoint of entropy, it can also discriminate the detailed information loss. Therefore, the entropy value of I E is calculated to assist the features F c and F e . Let F t be the feature of entropy, and the calculation formula is where l is the pixel value of I E , and p(l) is the probability density of l.
Through the above calculations, F c , F e and F t form a feature group, i.e., energy-related features.

Structure-related features
Generally, the edge, corner and texture are important for the structure of an image. Based on the structure tensor salient detection operator from reference [39], gradient information can be detected efficiently, and the result obtained is denoted by S T . In addition, to strengthen the gradient feature map obtained by structure tensor salient detection, the neighbor energy is calculated, denoted as N E , and it can be defined as where (x, y) denotes the central position of I S , v controls the size of the window, and (2v+-1)×(2v+1) is the size of the neighborhood.
Then, the enhanced gradient feature map S NE is calculated as To quantitate such differences, singular value decomposition (SVD) is used to calculate the primary information of S NE , denoted as S svd ðtÞ : t 2 ð1; 2; :::; qÞ, where q is the number of the SVD diagonal coefficients. The distribution of S svd is enumerated in the second line of Fig  3. It can be found that different S svd values show different and regular distributions, which can be captured by the Weibull distribution model. It can be defined as where χ and ε denote the shape parameter and scale parameter. The parameters constitute the first group of structure-related features, denoted as F w .
To capture the statistical differences more comprehensively, the moment features including variance, kurtosis and skewness are extracted to assist the statistical distribution properties. The obtained features are denoted by F m together and regarded as the second group of structure-related features.
Except for the statistical differences' calculation, the Local Binary Pattern (LBP) operator of local rotation invariant uniform is used on the S NE map to code the texture information, which can be expressed by where ρ(.) is the threshold function, and it can be calculated by where Z is the total number of pixels in the SLBP riu2 D;R , u is the possible LBP pattern. The histogram features are denoted as F h together and regarded as the third group of structure-related features.

Color-related features
As described in section 3.1, color information is very important for a high-quality MEF image. Since the RGB color space is highly correlated with each other, it is not proper for feature extraction. Using unrelated color space for feature extraction can eliminate the redundancy between different features, so as to perceive visual stimuli more effectively. Therefore, the RGB color space is converted into YUV color space. The obtained Y, U, and V color channels are utilized for the respective feature extraction.
For the Y color channel, the mean, standard deviation, and skewness are calculated as the first, second, and third-order moment statistics. As a result, the first group of color-related features is formed, denoted by F o .
Besides the F o , the natural scene statistic model is also used to extract features in the U and V color channels for representing the variation of color information. Specifically, U and V channel maps are processed by the local mean subtraction and divisive normalization, and then the mean subtracted contrast normalized (MSCN) coefficients are calculated, which can be given byk where k 2 {U,V},kði; jÞ are the MSCN coefficients of k at the position of (x, y). μ(i, j) and δ (i, j) represent the local mean and standard deviation of k(i, j), respectively.
The color information of different MEF images will have different MSCN coefficients distributions, which can be caught by the generalized Gaussian distribution (GGD). The GGD can be expressed by f ðo; a; l 2 Þ ¼ a 2bGð1=aÞ � e À ðjoj=bÞ a ð22Þ where b ¼ l ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Gð1=aÞGð3=aÞ p , α and λ 2 are the shape and variance of the Gaussian distribution, respectively, G � ð Þ is the Gamma function. Finally, the α and λ 2 of U and V color channel maps constitute the second group of colorrelated features, denoted by F n .

Quality regression
With the extracted energy-related features F c , F e and F t , structure-related features F w , F m and F h , and color-related features F o , F n , the whole feature vector can be denoted by F all = {F c , F e , F t , F w , F m , F h , F o , F n }. Then, random forest (RF) is adopted to learn the mapping relationship from the whole feature vector F all to the subjective rating. The RF model can be expressed by where Z RF � ð Þ is the mapping function by RF, and Q is the predicted quality score by the trained RF model.

Experimental results
In this section, the proposed method is compared with the state-of-the-art IQA methods on the MEF image dataset. In addition, the performances of different features are tested. Then, feature selection is also conducted. [40], the details of the dataset are shown in Table 1. The dataset is consisting of 136 MEF images, which were obtained from the MEF algorithms, i.e., local energy weighted linear combination, global energy weighted linear combination, Li12 [9], Raman09 [8], ShutaoLi12 [11], Mertens07 [7], Gu12 [10], ShutaoLi13 [12]. 2) Performance criteria: To validate the performance of the proposed method, three common indicators are used including the Person linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC) and Root mean squared error (RMSE). For these indicators, PLCC is utilized for prediction accuracy evaluation, SROCC calculates the prediction monotonicity, and RMSE represents the prediction error. For an excellent IQA method, the PLCC and SROCC are close to 1, and RMSE is close to 0. They can be represented as

1) Dataset: The proposed method is tested on an open MEF image dataset
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 K where Q i and M i are the predicted quality score and the MOS value of the i-th MEF image, while � Q and � M are the mean values of Q i and M i , respectively. l i represents the rank difference of i-th MEF image between the objective and subjective quality assessments. K represents the number of MEF images.
3) Evaluation protocol: In the experiments, k-fold cross-validation is employed to test the performance of the proposed method. Since the MEF dataset can be divided into 17 subsets based on the number of the source MEF images, k is 17, 16 subsets are used for quality

Overall performance comparison
To validate the performance of the proposed method, comparative experiments are implemented between the proposed method and 13 NR-IQA methods for ordinary and tonemapped images on the above-mentioned MEF dataset. Among these compared methods, 10 of which are designed for ordinary images, namely DIIVINE [16], BLIND-II [17], BRISQUE [18], CurveletQA [19], GradLog [21], ConstrastQA [20], GLBP [22], OG [23], NIQMC [25], and SCORER [26]. The remaining 3 methods are designed for tone-mapped images, involving BTMQI [31], HIGRADE-1 [32], HIGRADE-2 [32]. The PLCC, SROCC and RMSE results of such comparative methods and the proposed method are shown in Table 2. Note that the best performance values are highlighted in bold. From Table 2, it can be drawn that, the methods designed for ordinary images show poor performance. Since they mainly describe the distortion representation of ordinary images, the features extracted by these methods cannot match the special distortion of MEF images. It is obvious that they are not proper to predict the quality of MEF images. Among them, the Gra-dLog method exhibits relatively better performance, which may attribute to the consideration of structure information based on the gradient and Gaussian response. It can well represent the structure loss in the process of MEF, and the PLCC and SROCC values achieve the level of 0.631 and 0.567, respectively. In addition, the performance of the methods designed for tonemapped images reaches higher PLCC and SROCC values than that of the methods designed for ordinary images. The main reason is that the MEF images and tone-mapped images both have the problem of underexposure and overexposure. Finally, the performance of our proposed method is superior to other methods analyzed above. The proposed method makes use of the idea of energy layer and structure layer decomposed from MEF image based on the joint bilateral filter to percept the detail and structure information and so on. Besides, color information is also considered. In summary, the energy-related, structure-related, and color-related features can supplement each other.
To illustrate the performance comparison results more intuitively, Fig 4 gives out the scatter plots of the proposed method as well as the competing methods. As can be found, the proposed method has superior convergence than the other comparative methods and the points can be fitted by the logistic function better. So, it is obvious that the proposed method achieves better performance than other comparative methods.

Ablation study
Through the above overall performance comparison, the proposed method has been validated to have superior performance to other methods for ordinary and tone-mapped images. However, different features constitute the whole feature vector, and their own contribution should be tested. The results are given out in Table 3, including the individual features and the combinations of different features. Among the performances of different features, it can be drawn that, the energy-related features, the structure-related features and the color-related features contribute equally, and the combination of them maximizes performance.
Even so, the combination of multiple classes of features may cause redundancy. Therefore, feature selection is utilized based on the RF to determine the final feature vector, and the results are listed in Table 4. It can be clearly found that the 15-dimension feature vector selected from the whole extracted features obtained the best performance.
https://doi.org/10.1371/journal.pone.0283096.g004 algorithms is also meaningful for the MEF-IQA, i.e., the cause of distortion formation can be considered while mining the distortion characteristics. In fact, although the proposed method has good performance, there is still room for improvement. The MEF-IQA method can be further improved by incorporating such knowledge. Besides, it can be considered to combine high-level semantic features to enhance the ability of feature representation.

Conclusion
In this paper, a novel blind quality assessment method considering the detail, structure and color characteristics is proposed for the MEF images. Inspired by the image fusion process, the MEF image is decomposed into two components (i.e., the energy layer and structure layer) for representing the detail and structure information through the joint bilateral filtering firstly.
Then, corresponding features are extracted from them to describe the detail and structure distortion characteristics of MEF images. Besides, color space conversion is also utilized for color-related features extraction to perceive color degradation. Finally, to achieve the best performance, feature selection is conducted based on the random forest. Experimental results demonstrate the superiority of the proposed method.